-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filterFiles to lib and lib.sources #188301
Conversation
ursi
commented
Aug 25, 2022
Get all files (recursively) that pass a filter function. Unlike other source | ||
filtering functions, this will not include empty directories in the final | ||
derivation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function always scans the entire directory tree. In large code bases, you can improve performance by filtering the second argument. |
(assuming you're flipping the parameters as suggested below, and we fix sourceLike
support)
includedPaths = | ||
builtins.filter | ||
filter | ||
(lib.filesystem.listFilesRecursive src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not actually support all sourceLike
values, such as the result of cleanSource
. Types are a bit vague in Nix documentation, so I really can't blame you for not picking up on this subtlety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried
lib.filesystem.listFilesRecursive (lib.cleanSource ./.)
in nixpkgs and it worked.
lib/sources.nix
Outdated
(path': lib.hasPrefix path (toString path')) | ||
includedPaths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the source O(#files
²) . Maybe something the memoizeSource
function could be useful.
lib/sources.nix
Outdated
(lib.filesystem.listFilesRecursive src); | ||
|
||
name = | ||
# Fix the problem of store paths names causing a rebuild every time they change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is not unique to this function, so it should be moved somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I good to pull this out into its own function and submit a PR with both, or should it be pulled out and put into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is no good, because it reintroduces the parent path impurity.
Unfortunately it's up to us to make sure that the evaluation of a directory does not depend on the location of that directory. That location includes the basename: someone may use a different checkout, or a flake source may resolve to a different location depending on the entire project through the store path hash, thereby largely defeating the purpose of source filtering.
name
should be "source"
unless the caller specifies something else. In this case you can let the caller set the name afterwards with cleanSourceWith
. @infinisil is working on #112083, which will add a more convenient function for setting the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate on what exactly "largely defeats the purpose of source filtering"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies; I might have responded too quickly here. Nonetheless I am not sure if this new solution should be applied to this new function, and whether this new function fits the larger design of the source combinator library. I'll defer to @infinisil who's currently more up to speed about all of that, including lazy trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ursi I'm not entirely sure I understand this name handling here. What exactly does it do and what use cases are affected by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It addresses the problem caused by using builtins.filterSource
on a generated source. Such a source will have a name of the form /nix/store/c59y63zj2pgjx87184kv1kyxqqs9b5sl-source
, which will be updated any time that source changes (A particularly common example of this is the source ./.
inside a flake). Because filterSource
uses the base name of the file for the name of the new derivation, this means that whenever the generated source changes, the source created from filterSource
changes, even if the files are exactly the same - it has a different name.
This function checks to see if you're filtering a source from a derivation, and if you are, it just uses the generic name "source" instead of getting the name from the base name like filterSource
would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so on one hand this shouldn't be a problem anymore with lazy trees, which makes it so that the path to the local flake is not directly a nix store path anymore. But also, that Nix branch would break the use case this intends to fix, because baseNameOf
for lazy paths (default for flakes) returns something like "virtual0000000000000000000000004-source". While I reported an issue with exactly this in that PR, all the proposed solutions would also break the code here (and if you consider paths to always be relative to a filesystem root, it also makes sense). In general I'd say that we shouldn't split "/nix/store/-" paths and try to infer something from that, it's not very sound in general, not just for the lazy trees PR.
On the other hand, @roberth's proposed solution of just always using "source"
as the name and deferring changing of that name to other modifier functions sounds much better, I think we should go for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone ahead and just used the name "source" for everything.
lib/sources.nix
Outdated
cleanSourceWith { | ||
inherit name src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines can be simplified to filter
after the other changes.
Would it make sense to relativize the paths with respect to project root? This way someone can include/exclude directories "for free" with this api, for instance by using (If paths are not relativized, then you could test for |
Good point. It is unclear to me why |
Path values in Nix are always absolute. I don't think diverging from that is a great idea, because it creates a patch interface, compared to the other source functions. |
well for one,
outputs |
(Relative paths also improves purity, since strictly speaking otherwise one can write a filter over e.g. |
@roberth If it's easy enough to demonstrate, can you give an example of the patch interference in code? |
Okay so I realized flakes actually fix the impurity issue of using absolute paths, since everything is moved to the nix store for evaluation, however, I still do think the relative path API has superior ergonomics. |
AFAIU, NixOS/nix#6530 makes it so flakes can be evaluated without copying them to the nix store first. |
Well if that's the case, the potentially the impurity argument still stands. |
I have pushed a new version with a lot of the changes made that had been pointed out. It is also much faster than before. |
I recently developed a safer and easier-to-use abstraction for source filtering in a draft PR to Nixpkgs, please take a look, try it out, and give feedback! https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117 In particular the use case of this issue is covered with the generic let
fs = lib.fileset;
in
fs.fileFilter (file: file.ext == "html") ./. Though we can also easily add a convenience function like |
@infinisil in your example, does that just get all the html files directly in |
@ursi It traverses the entire |
Oh I'm just realizing that I answered something different than I think you asked. So the answer is that it only copies the html files into the Nix store. The other files are traversed recursively but not copied. |
I finally looked more into the fileset library. It does have one restriction that this does not - it doesn't work on any path, only local path values. So you cannot filter the result of another derivation. |
@ursi Indeed. What's your use case for filtering a store path? I'm wondering because maybe we could also have library functions for that, see also #264541. Note that if you need to filter a It does make a lot of sense for |
Well, the toy use case that made me realize this limitation was me trying to use your library to filter over all of nixpkgs to compare its performance with this function. I just tried pulling in nixpkgs from my project's flake input in the repl. |
@ursi I see, we obviously can't justify this as a use case for that feature, but if you have a practical use case let me know in #264541. I'd be also very happy to get feedback on the library in general, probably best to open issues and ping me if you encounter problems :) Regarding performance, I've done a benchmark before, and indeed it's less performant, however it's not possible to improve it without updating the builtins, see NixOS/nix#8820. In fact it's the same problem you've had to work around in this very PR here (doing a recursive In any case, I've done my best to optimise the file set library as much as possible without any such builtin, there's even a benchmark to check for regressions. |
I'll close this PR, since the file set library is fairly usable now. See #266356 for status, updates and feature requests! |